1,539 research outputs found
Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk
This paper introduces an open-ended sequential algorithm for computing the
p-value of a test using Monte Carlo simulation. It guarantees that the
resampling risk, the probability of a different decision than the one based on
the theoretical p-value, is uniformly bounded by an arbitrarily small constant.
Previously suggested sequential or non-sequential algorithms, using a bounded
sample size, do not have this property. Although the algorithm is open-ended,
the expected number of steps is finite, except when the p-value is on the
threshold between rejecting and not rejecting. The algorithm is suitable as
standard for implementing tests that require (re-)sampling. It can also be used
in other situations: to check whether a test is conservative, iteratively to
implement double bootstrap tests, and to determine the sample size required for
a certain power.Comment: Major Revision 15 pages, 4 figure
Composite Correlation Quantization for Efficient Multimodal Retrieval
Efficient similarity retrieval from large-scale multimodal database is
pervasive in modern search engines and social networks. To support queries
across content modalities, the system should enable cross-modal correlation and
computation-efficient indexing. While hashing methods have shown great
potential in achieving this goal, current attempts generally fail to learn
isomorphic hash codes in a seamless scheme, that is, they embed multiple
modalities in a continuous isomorphic space and separately threshold embeddings
into binary codes, which incurs substantial loss of retrieval accuracy. In this
paper, we approach seamless multimodal hashing by proposing a novel Composite
Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds
correlation-maximal mappings that transform different modalities into
isomorphic latent space, and learns composite quantizers that convert the
isomorphic latent features into compact binary codes. An optimization framework
is devised to preserve both intra-modal similarity and inter-modal correlation
through minimizing both reconstruction and quantization errors, which can be
trained from both paired and partially paired data in linear time. A
comprehensive set of experiments clearly show the superior effectiveness and
efficiency of CCQ against the state of the art hashing methods for both
unimodal and cross-modal retrieval
Energy-based temporal neural networks for imputing missing values
Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset
Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks
Undirected graphical models are widely used in statistics, physics and
machine vision. However Bayesian parameter estimation for undirected models is
extremely challenging, since evaluation of the posterior typically involves the
calculation of an intractable normalising constant. This problem has received
much attention, but very little of this has focussed on the important practical
case where the data consists of noisy or incomplete observations of the
underlying hidden structure. This paper specifically addresses this problem,
comparing two alternative methodologies. In the first of these approaches
particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently
explore the parameter space, combined with the exchange algorithm (Murray et
al., 2006) for avoiding the calculation of the intractable normalising constant
(a proof showing that this combination targets the correct distribution in
found in a supplementary appendix online). This approach is compared with
approximate Bayesian computation (Pritchard et al., 1999). Applications to
estimating the parameters of Ising models and exponential random graphs from
noisy data are presented. Each algorithm used in the paper targets an
approximation to the true posterior due to the use of MCMC to simulate from the
latent graphical model, in lieu of being able to do this exactly in general.
The supplementary appendix also describes the nature of the resulting
approximation.Comment: 26 pages, 2 figures, accepted in Journal of Computational and
Graphical Statistics (http://www.amstat.org/publications/jcgs.cfm
A Bayesian reassessment of nearest-neighbour classification
The k-nearest-neighbour procedure is a well-known deterministic method used
in supervised classification. This paper proposes a reassessment of this
approach as a statistical technique derived from a proper probabilistic model;
in particular, we modify the assessment made in a previous analysis of this
method undertaken by Holmes and Adams (2002,2003), and evaluated by Manocha and
Girolami (2007), where the underlying probabilistic model is not completely
well-defined. Once a clear probabilistic basis for the k-nearest-neighbour
procedure is established, we derive computational tools for conducting Bayesian
inference on the parameters of the corresponding model. In particular, we
assess the difficulties inherent to pseudo-likelihood and to path sampling
approximations of an intractable normalising constant, and propose a perfect
sampling strategy to implement a correct MCMC sampler associated with our
model. If perfect sampling is not available, we suggest using a Gibbs sampling
approximation. Illustrations of the performance of the corresponding Bayesian
classifier are provided for several benchmark datasets, demonstrating in
particular the limitations of the pseudo-likelihood approximation in this
set-up
The statistical mechanics of networks
We study the family of network models derived by requiring the expected
properties of a graph ensemble to match a given set of measurements of a
real-world network, while maximizing the entropy of the ensemble. Models of
this type play the same role in the study of networks as is played by the
Boltzmann distribution in classical statistical mechanics; they offer the best
prediction of network properties subject to the constraints imposed by a given
set of observations. We give exact solutions of models within this class that
incorporate arbitrary degree distributions and arbitrary but independent edge
probabilities. We also discuss some more complex examples with correlated edges
that can be solved approximately or exactly by adapting various familiar
methods, including mean-field theory, perturbation theory, and saddle-point
expansions.Comment: 15 pages, 4 figure
Solution of the 2-star model of a network
The p-star model or exponential random graph is among the oldest and
best-known of network models. Here we give an analytic solution for the
particular case of the 2-star model, which is one of the most fundamental of
exponential random graphs. We derive expressions for a number of quantities of
interest in the model and show that the degenerate region of the parameter
space observed in computer simulations is a spontaneously symmetry broken phase
separated from the normal phase of the model by a conventional continuous phase
transition.Comment: 5 pages, 3 figure
Creation and characterization of vortex clusters in atomic Bose-Einstein condensates
We show that a moving obstacle, in the form of an elongated paddle, can
create vortices that are dispersed, or induce clusters of like-signed vortices
in 2D Bose-Einstein condensates. We propose new statistical measures of
clustering based on Ripley's K-function which are suitable to the small size
and small number of vortices in atomic condensates, which lack the huge number
of length scales excited in larger classical and quantum turbulent fluid
systems. The evolution and decay of clustering is analyzed using these
measures. Experimentally it should prove possible to create such an obstacle by
a laser beam and a moving optical mask. The theoretical techniques we present
are accessible to experimentalists and extend the current methods available to
induce 2D quantum turbulence in Bose-Einstein condensates.Comment: 9 pages, 9 figure
Phase I–II trial design for biologic agents using conditional auto‐regressive models for toxicity and efficacy
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147824/1/rssc12314_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147824/2/rssc12314.pd
- …